perm filename FIND.DON[UP,DOC]8 blob sn#508871 filedate 1980-05-02 generic text, type T, neo UTF8
FIND is a system command that causes a search for a specified key in a
specified file.  The FIND command has the following syntax:

  FIND[ WITHIN <delim>][ SURROUND <num>] <key>[ OMIT[TING] <omits>]
					[ IN <file>][ WRITING <file>]

where [] indicates optional elements.  There are also DFIND and OFIND
commands with the same syntax (except for the command name).  <key> is
the string of characters to be found in the file <file>, <omits> are
characters to be ignored in the file, <num> is the max number of lines
to print on either side of a "hit", and <delim> is either a single
character or one of the following words:  MSG LINE PAGE PARAGRAPH GRAF.

Now for some details.


FILE TO BE SEARCHED

The default file to be searched is the lab phone directory.  If you use
the DFIND command, the default is the unabridged dictionary word list.
You can specify your own default via the OFIND command (see below).

If you specify a file other than the default, certain special names are
recognised:  PHONE gets you the phone directory (without having to type
its full name), DICT gets you the dictionary, FORWARD gets you the file
of mail-forwarding entries, and ∂ (partial sign) gets you your mail
file.  You can optionally follow the ∂ with (1) a programmer name to
specify a mail file other than your own, or an asterisk (*) to specify
the system message mail file NOTICE.TXT, and/or (2) an extension to
specify one other than .MSG (or .TXT), and/or (3) a PPN to specify one
other than [2,2].


WHAT GETS PRINTED

When a match is found, the "unit" of the file that includes the LAST
character of the key is printed, where the "unit" is determined by the
<delim>.  The default unit is the PARAGRAPH except for the DFIND
command, for which the default is the LINE.  If the partial-sign (mail)
filename specifier is used, the default unit is MSG, and if the FORWARD
filename is used, the default unit is LINE even if you forgot to use the
DFIND command.  You can override these defaults using OFIND or the
WITHIN clause (see syntax above).  The delimiter <delim> can be
specified as a single character or it can be specified by one of the
following exact names:

	MSG  LINE  PAGE  PARAGRAPH  GRAF

ghere MSG means that partial sign (∂) is the delimiter (designed for use
with mail files), LINE means that the end of a line is the delimiter
(i.e., only the line on which the key ends will be typed), PAGE means
that formfeed (a pagemark) is the delimiter, and PARAGRAPH or GRAF means
that a blank line or pagemark is the delimiter.

The delimiter character will be treated as a delimiter only if it occurs
as the first character on a line; a line starting with the delimiter
character is considered the first line in a new text unit, and the
previous line is the last line in the previous text unit.  Text units
may span page boundaries (except for PAGE, PARAGRAPH, or GRAF); the
pagemarks are not printed.

Within a single delimited text unit, up to 25 lines can occur both
before and after the line in which the key is found.  If more than 25
lines occur before and/or after the key but within the delimited text
area, an ellipsis (. . .) will be typed out before the first line typed
out and/or after the last line typed out.  Further occurrences of the
key within the same text unit will not be detected unless the 25-line
limit is exceeded and the key occurs entirely after the last line typed.
Normally (i.e., if the text unit ends within 25 lines after the key)
the search picks up starting with the CRLF (carriage return line feed)
ending the last line printed (thus the CRLF can be used as part of the
key to search for occurrences only at the beginnings of lines).  The
25-line limit can be extended or truncated using the SURROUND clause in
the command line.

Each separate text unit containing the key is a "HIT" and is typed out,
with each line of text preceded by an asterisk (*) except that the line
in which the key occurs is preceded by a greater-than sign (>).  Also,
the hits are counted and the count is printed after the whole file has
been searched.  (Multiple hits within a single text unit may occur; see
preceding paragraph.)  In the printout the hits are separated by blank
lines.  EXCEPTION: If the <delim> is LINE, then no blank lines are
inserted and the "*" and ">" prefixes are omitted.


WHAT GETS SEARCHED FOR

Here's where things get interesting.  Within the <key>, certain
characters have special interpretations, as listed below:

     comma	Separates two strings to be searched for simultaneously;
		that is, FIND FOO,BAR,BAZ will search for FOO, BAR, and
		BAZ.  Simultaneous searches like this take no more (or
		less) time than searching for a single string.
    letter	Matches either upper- or lower-case in the file.
     'xxx	Character with ascii code xxx (octal); e.g., '044=$.
		FIND q'015 will look for lines ending with "q" (or "Q").
     {xyz}	Any of the characters xyz; any number of characters may
		be given between the braces, and they may include any of
		the constructs listed here except comma, infinity, or
		another `embraced' string.  For instance, FIND ≡M{s¬∃≡,}
		searches for an upper-case M followed by either a
		lower-case s or an upper-case S or a CR, LF, tab, space,
		formfeed, or comma (see below regarding "≡", "¬", "∃").
       ∀	Any character.
       ∃	Any character except CR, LF, tab, space, or formfeed.
      ¬x	Any character except x (x can be a multi-character
		construct such as {xyz} or ∃).
       |	Equivalent to ¬∃, i.e., any of: CR, LF, tab, space, FF.
      ≡x	The character x (used to quote these special chars;
		can also be used to quote a letter to enforce either
		upper- or lower-case).
      ∞x	Any number (including zero) of repetitions of x (x can
		be a multi-character construct; see examples below).
     space	Equivalent to ∞|, i.e., zero or more spaces, tabs,
		CRLFs, and formfeeds; to match precisely one space,
		quote the space with `≡'.

Note: The time taken by the search is independent of the complexity of
the key, although extremely complex keys may take a few seconds to
initialise the search.  (This does not consider the time taken to print
the hits.)  For example, searching for any key in the dictionary (about
2.9 million characters of text) takes about 6.5 seconds of Ebox time.


PUTTING THE RESULTS IN A FILE

If you include a WRITING clause in your command, then the only thing
printed on your terminal will be the number of hits, after the entire
file has been searched.  The actual hits (if any) will be written in
the specified file.


THE "OMIT" CLAUSE

The OMIT (or OMITTING) clause in the syntax above lets you specify
certain characters to be ignored during the search.  The default is
'012'000, i.e., ignore linefeeds and nuls.  (Thus carriage returns may
be used as single-character delimiters around lines.)  The <omits>
string takes precedence over the <key>; i.e., FIND XYZ OMITTING Y is
guaranteed to find zero hits.

The <omits> string may include any of the special constructs listed
above for the <key>, except for the "∞" and "space" constructs.


THE "OFIND" COMMAND

If you use OFIND (currently abbreviatable to OF) instead of FIND in the
syntax above, it is exactly like FIND except that the OPTION.TXT file on
your login area is scanned for a line beginning "FIND:" (case of letters
is ignored) and, if found, the line is used to override various
defaults.  The format for the line is:

	FIND:[ WITHIN <delim>][ OMIT[TING] <omits>][ IN <file>][;]

Any fields not specified in OPTION.TXT retain their usual defaults as
defined in the preceding sections.  As a special case for compatibility
with an earlier version of FIND, the line may read:

	FIND:<file>[;]

to specify a default file without affecting the <delim> and <omits>.  If
you have more than one file you like to search a lot, you can set up
multiple OPTION.TXT lines of the form:

	FIND/<ident>: etc.

and the command OFIND/<ident> etc. will use the specified option line.
OFIND with no /<ident> will use the first FIND option line it comes to.


EXAMPLES

	FIND [LES:

will print out the entry for LES in the phone directory.  (The "[" and
":" keep it from finding arbitrary words that happen to contain the
string "les", since in the directory the programmer name field is
surrounded by those characters.)

	FIND garply baz in ∂

will search your own mail file for "garply baz" and type out the
entire message(s) it occurs in.

	FIND ≡	ME in ∂*
    or	FIND '011ME in ∂*

(that's ≡<tab>ME in the first one) will find all system messages (in
NOTICE.TXT[2,2]) from ME.

	FIND president in ∂.nap

will search your News Service notification file on [2,2] for all
notifications containing "president".

	FIND RUN IN IN IN

will search for "RUN" (ignoring case) in a file with the unlikely name
"IN IN".  The ≡ construct can be used to override this, as in

	FIND RUN≡ IN IN IN

which searches for the phrase RUN IN in the file named "IN", since the
quoted space prevents the first "IN" from being taken as a file-name
lead-in.

	DFIND k∞{aeiou}k WRITING kk

will search for all words in the dictionary containing two k's (upper-
or lower-case) with nothing but vowels between them, and will put those
words into a newly-written file called KK on the your area.

	DFIND |≡a∃∃∞∃{pt}¬{aeiou|}

will search for all words beginning with (due to the initial "|", which
matches only a CR, LF, tab, space, or formfeed) a lower-case `a',
followed by two or more non-delimiting characters, followed by either a
`p' or a `t', and then any non-delimiter other than a vowel.

More example commands:

FIND MCCARTHY
FIND TARGET BYTE in COMLIN.FAI
FIND LOMA VERDE
FIND EVENT IN ∂*
FIND ≡ E≡  IN ∂*
FIND WITHIN MSG PDP-11 IN OUTGO.MSG
FIND WITHIN PAGE UDPUFD IN MONCOM.BH[S,DOC
DFIND weird,wierd
FIND john∞∀lathrop

Example output for the last example command:

*McCarthy, John (Prof. John)  - Professor                [JMC: FR*,DIAL*,VB] 2000 200
*        AI207; 7-4430                                         Sep 4
>        846 Lathrop Dr., Stanford 94305, 321-7580

1 hit on key "john∞∀lathrop".


USING DFIND FROM E

The E editor has an extended command ⊗X DFIND that interfaces with the
FIND program.  (Note that the ⊗X FIND command in E is something else
entirely!)  E's DFIND command starts up a phantom job to do the search,
and a summary of the results are sent to your terminal.  The summary
includes the number of hits, as well as the first, last, and (if
different) shortest "hit" lines.

The syntax of the DFIND command in E is the same as that of the monitor
command, with the same defaults for the WITHIN, OMIT, and IN clauses.
If you use "DFIND/<ident>", the defaults are overridden using your
OPTION.TXT, as if you had invoked OFIND.  (This is not permitted with
the monitor command.)  If you specify a WITHIN other than the default of
WITHIN LINE, then the text units containing the first three hits
(instead of the first, last, and shortest) will be sent to you; this
will probably overflow the page printer at the bottom of E's window, so
you'll have to use <break>-N to see the whole text.  Only 3 lines on
either side of each hit are included (instead of the usual 25), though
you can as usual override this with the SURROUND clause.

If you want to see more than just the summary of hits, you can use the
WRITING clause to send all the hits to a file.

If you don't give anything on the DFIND command line in E (i.e., if you
just type ⊗X DFIND<cr>), the current line of text (or the first line of
attached text, if any) will be used to specify the search parameters.


FINAL NOTES

If you type just FIND (or DFIND or OFIND) without providing a <key>,
you'll get a summary of the command syntax and special <key> constructs.

The special sequences currently recognised within the <key> string allow
specification of a subset of regular expressions.  The main loop of the
FIND program could just as easily handle any regular expression without
slowing down any, but I don't feel like writing a parser for the darn
things.  If someone wants to provide a SAIL program that converts
regular expressions into a transition table for a finite state machine,
I'll consider building it into FIND.

Send comments, questions, gripes, etc., via GRIPE FIND.